Data Science and visualisation : introduction

Etienne Côme

October 17, 2024

Data Science ?

The next sexy job

The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it, that’s going to be a hugely important skill.

– Hal Varian, Google

Data Science ?

Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics.

Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools & materials, coupled with a theoretical understanding of what’s possible

– Mike Driscoll, CEO of metamarkets

Drew Conway’s Data Science Venn Diagram

Data Science ?

A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product.

– Hilary Mason, chief scientist at bit.ly

Data Science?

Talking about data also evokes the datascientist, this five-legged sheep of data with statistical, computer skills, perfectly understanding the business stakes of the company… Is he also a fantasy of the ambient discourse on big data ?

Data Science?

While there may exist profiles that come close to this description, reality shows most often that datascience, like science in general, does not happen alone but in a group.(…)

Another little-known fact about the datascientist is that he is first and foremost a craftsman’s trade. Each problem and each dataset always requires a specific approach that cannot be industrialized, which many people still don’t understand.

A fashion with ancient origins


Johann Kepler

A fashion with ancient origins


Charles Joseph Minard

A fashion with ancient origins


Charles Joseph Minard

A fashion with ancient origins


William Sealy Gosset (Student)

Key competencies

1. Prepare data (DB)

Recover, mix, enrich, filter, clean, verify, format, transform data…

2. Models (ML/Stats)

Decision tree, regression, clustering, graphical model, SVM…

3. Interpret/share (Visualisation)

Graphics, Data visualization, Maps…

Key competencies

1. Prepare data (DB) – 80% of the job

Recover, mix, enrich, filter, clean, verify, format, transform data…

2. Implementing a method a model (ML/Stats)

Decision tree, regression, clustering, graphical model, SVM…

3. Interpret/share (Visualisation) – 80% of the job

Graphics, Data visualization, Maps…

Key competencies

1. Data Munging

Retrieve, mix, enrich, filter, clean, verify, format, transform data

2. Statistics

Traditional data analysis

3. Visualisation

Graphics, Data visualization, Maps…

Course Outline

  • handling R data with dplyr
  • introduction to visualization, good practices & common mistakes
  • ggplot and grammar of graphics
  • spatial data
  • introduction to cartography

Some projects

http://www.comeetie.fr/galerie/leboncoin/

Some projects

https://www.comeetie.fr/galerie/francepixels2023/

Some projects

https://www.comeetie.fr/galerie/sankeystif/

Smart-card data analysis

Smart-card data analysis

Smart-card data analysis

Smart-card data analysis

Metro load prediction

Metro load prediction

Metro load prediction